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Many important problems in psychology and biomedical stud- 
ies require testing for overdispersion, correlation and heterogeneity 
in mixed effects and latent variable models, and score tests are par- 
ticularly useful for this purpose. But the existing testing procedures 
depend on restrictive assumptions. In this paper we propose a class 
of test statistics based on a general mixed effects model to test the 
homogeneity hypothesis that all of the variance components are zero. 
Under some mild conditions, not only do we derive asymptotic distri- 
butions of the test statistics, but also propose a resampling procedure 
for approximating their asymptotic distributions conditional on the 
observed data. To overcome the technical challenge, we establish an 
invariance principle for random quadratic forms indexed by a param- 
eter. A simulation study is conducted to investigate the empirical 
performance of the test statistics. A real data set is analyzed to il- 
lustrate the application of our theoretical results. 

1. Introduction. Mixed effects and latent variable models provide an at- 
tractive framework to accommodate correlated data. For example, structure 
equation models and generalized linear mixed models (GLMMs) are com- 
monly used in behavioral, educational and social sciences (e.g., [2, 3]). A 
fundamental question in mixed effects or latent variable models is whether 
or not the inclusion of the random effects or latent variables is necessary. 
Many authors have examined this important issue using score test statistics 
in the framework of the GLMMs; see [8, 16, 21, 22, 32], for example. How- 
ever, those authors did not fully exploit the general correlation structure of 
the random effects (or latent variables). 
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Suppose that we observe data from n units and within the ith unit we 
have mi measurements, i = 1, . . . , n. This is a typical data structure in longi- 
tudinal and family studies that are popular in social and biomedical studies. 
In longitudinal studies, the unit is usually a person or an animal. In fam- 
ily studies, the unit is generally a family. In addition to the following two 
examples, other examples can be found in [38]. 

Example 1 (Segregation analysis of ordinal traits). To study the genetic 
inheritance pattern of many health conditions such as cancer and psychiatric 
disorders, Zhang, Feng and Zhu [34] proposed a general framework for con- 
ducting complex segregation analysis of ordinal traits based on the latent 
variable model of Zhang and Merikangas [35]. Let Y{ = (j/i i, . . . , yi, mi ) T be a 
vector of traits and Xi the covariates from the ith family, i = 1, . . . , n. With- 
out loss of generality, suppose that assumes ordinal values 0, 1 or 2. To 
model the potential familial correlation, they introduced a latent variable 
vector Vj for each family to represent common unmeasured environmental 
and genetic factors shared by family members. Conditional on the {vj}, 
the 2/ij's are assumed to be independent and follow the proportional odds 
logistic model given by 

logit P{yij = 0| vj = xf S + a + bi.j, 

(1) 

logit P{yi,j < l|vj} = xjj(3 + ai + b itj , 

where ao < «i, hj depends on {v^} and X^ ; j IS Si covariate vector in the 
design matrix Xi = (x?\, . . . ,x^ m .) (m, x q\) from the jth member in the 
ith family. An important objective in collecting family data is to test familial 
aggregation and inheritance, which can be achieved by testing varffoy] = 
for all i and j. 

Example 2 (Generalized linear mixed effects model). Consider a data 
set that is composed of a response yij, covariate vectors Xy and zy for 
observations j = 1, . . . , m% within clusters i = 1, . . . , n. We define the gener- 
alized linear mixed effects models as 

( 2 ) P(Vi,j\ h i) = exp[0{yij6»ij - a(0y)} + c(yy, ^)] 

and fiij = E(yi : j\hi) = g(-xj,j{3 + z^b,), where a(-), c(-) and g(-) are known 
continuously differentiable functions. The random coefficients bj's (q x 1) 
are normally distributed such that E[hi] = and _E[bjb^] = E. Moreover, for 
i^=i', hi and bj/ are independent of each other. The so-called homogeneity 
test is to test whether S = 0. 

To summarize the two examples presented above, we consider the follow- 
ing mixed effects model. We use (j/ij,xy,zy) to denote the jth. observation 
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in the ith. cluster. The total number of observations is N = Y^i=\ m i- Fur- 
thermore, we assume that for each Yi, there exists an unobserved q x 1 latent 
variable (or random effect) vector bj. Given {hi;i = l,...,n}, the compo- 
nents of {Yi;i = l,...,n} are independent random variables and have the 
joint probability density function 

mi 

(3) p(Yi\bi) = Ylp(y i j\il>ij(b i ; 0,7(1)),$), 

3=1 

where ^>i,j(b;; (3, 7 (1) ) = g(x[j@; fijfaj, 7(i)) T bj) and $ is a dispersion pa- 
rameter vector. In addition, g(-) is a known link function and fij(-) is a 
q x 1 vector function, and (3 and 7/1) are, respectively, q\ x 1 and 53 x 1 vec- 
tors. The unobserved random variables, bj, satisfy E[hi] = and E'fbjb^] = 
Sj,i'(7), where 7 is a (72 x 1 vector. Model (3) also includes the factor analysis 
model and the random coefficient model, in which fij(-,-) may depend on 
unknown parameters. Hereafter, we include 7(1) in 7 for notational simplic- 
ity. 

We are interested in testing the homogeneity hypotheses 

(4) Hq : Ej £'(7) = for all vs. H\ : £^(7) 7^ for some i,i . 

We generally conduct the omnibus testing in (4) , because it is easy to control 
its type I error. If the null hypothesis is rejected, it is interesting to find 
out which components are nonzero. While the details warrant a separate 
investigation, the results presented here will be useful for testing that some 
parameters in (4) equal zero. 

To test the homogeneity hypotheses in (4), we need to address the follow- 
ing four issues: (a) a convenient parameterization for the homogeneity test; 
(b) the construction of a score test statistic; (c) the asymptotic distribution 
of the score test statistic under the null hypothesis; and (d) the computation 
of the p-value from the asymptotic distribution. 

The solution to the first issue on the parameterization lays the foundation 
for resolving the subsequent issues. Let us examine a simple case of Example 
2 with q = 2. We write the covariance matrix of bj, E as 

/g\ f °\ P a ^\ =a ( c os 2 (7i) 72 sin (71) cos (71) 

\p<Tia 2 cr| / T V 72 sin (71) cos (71) sin 2 (71) 

where ot = <?\ + of an d {p\Igt,g\I&t) = (cos 2 (7i), sin 2 (7i)). We see that 
the null hypothesis in (4) is equivalent to £(7) = 0, that is, &t = 0. The 
first and second derivatives of the log-likelihood function with respect to all 
parameters a%, 02, and p in £(7) are not continuous when £(7) = 0; however, 
they are continuous in ctt at ctt = [1]. In this simple case, we simply test 
cjt = and treat the other parameters as nuisance parameters. 
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When q > 2, we consider a lower triangular Cholesky decomposition of 
E, denoted by L = which satisfies &n > for all i = 1, . . . , q and = 

for i < j. Furthermore, we define L = Ar, where A = di&g(^i i, . . . ,£g,q)/ 

^J2i=i ^li an d r = (7i,j) is a q x g lower triangular matrix with 7^ = 1 for 

1 = 1, . . . , g. Let or = J2t=i^ii- Then E can be written as E = o"TArr T A. 
Thus, the null hypothesis in (4) is equivalent to cjt = 0. 

To our knowledge, there are no satisfactory solutions to the remaining 
three issues. For example, Chen, Chen and Kalbfleisch [5, 6], Chen and 
Chen [4], Crainiceanu and Ruppert [10] and Zhu and Zhang [37] derived the 
asymptotic or small sample distributions of the likelihood ratio statistics 
for some specific mixed effects models under restrictive conditions. Others 
considered score test statistics. Liang [21] and Commenges and Jacqmin- 
Gadda [8] considered the case when the random effect, bj, is scalar. Although 
Lin [22] and Hall and Prasstgaard [16] considered a multidimensional hi, 
their fij(-,-) does not contain 7m. In other words, the existing results do 
not cover our examples and the general model (3). Thus, it is imperative for 
us to develop score test statistics and establish the asymptotic theory under 
a more general framework. 

2. Score test statistics of homogeneity. 

2.1. Score test statistics. From now on, we write 

(6) s i,i'(7) =°~TWi ! i>(-y) for all = 1, . . . ,n, 

where gt is introduced above. Under the parameterization (6), we formally 
state the homogeneity hypotheses as 

(7) H :a T = vs. Hf.a T > 0. 

Letting Uj = <7 T 1 ^ 2 bj, we see that E[ui] = and -E[ujU^] = Wa'(^). Thus, 
the log-likelihood function £ n (a"T|/3,7, is given by 

{j. n mi ~j 
/ II I[p(yij\^ij(xfjP'Jij( z iji7(i)) T ^(TT : 2 ),^)(ii ? (ui,...,u n |7) L 
J i=lj=l ) 

where F(u±, . . . , u n |7) is the distribution function of (m, . . . , u n ). Let Uj = 

o~T 2 r]ij, where rjij = fij(zij,^^) T Ui. Similarly to Liang [21], we can show 
that the first-order right derivative of £ n (<7r|/3,7,3>) at ot = 0, denoted by 
Ts(l/\/3,$), is given by 



0.5 



J » 91ogp(y M # M (x^./?;t M )) \ 2 

i=l j=l 1 



dF(ui,...,u„|7); 
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see [38] for a detailed derivation. We will describe later how to estimate 
$), but for the time being, let us treat them as if they were known 
and not include them as parameters to simplify the notation. That is, let 
Is(t) = Tsdlfl, 3>). If 7 is actually absent in all of the W$ j'(7)'s, Is (7) is a 
score test statistic identical to that proposed by Liang [21] and Commenges 
and Jacqmin-Gadda [8]. In general, however, Is (7) is not really a score 
statistic due to the presence of 7. 

Let b KiK i{i) be /ij(zi )J -,7( 1 )) T W i)i /(7)/ i / jJ -/(z i / jJ v,7( 1 )) and 5(7) = 
(frfC,K'(7)) be an iV x iV matrix, where K = (i,j) and Iv = (i',f). With 
this notation, Is (7) can be decomposed into two terms, 

(8) Ts(7) = U T fi(7)U-tr[VS( 7 )], 

where U and V are, respectively, an N x 1 vector and an TV x TV ma- 
trix. Let Uij and Vij be the limits of dlogp(yij|^j(x^/3; and 
—d 2 logp(yij\ipij('KfjP;tij))/dtfj, respectively, as Uj — > 0. The ifth ele- 
ment of U is f/jj, and V is a diagonal matrix with Iftli element Vij. 

Following Commenges and Jacqmin-Gadda [8], we can decompose Is (7) 
into two terms, 

Ts(7) = T P ( 7 ) + Ib( 7 ), T (7) = E^W(^1 " V K ), 
T P ( 7 ) = U T {5( 7 ) - diag[5( 7 )]}U = £ b K , K >{l)U K U KI , 

where diag[B(7)] is the N x N diagonal matrix of B('j). The first term 
Tp{pi) is called a pairwise correlation term and the second term 10(7) is an 
over dispersion term. Under the null hypothesis Hq, we have 

E[T P ( 7 )] = E[T (j')] = E[T P ( 7 )Toh')] = 0, 

E[TsWTstf)] = E[Tp( 7 )T P tf)] + E[T (ri)To(ri% 

E[Tp{ 1 )T P (i))=2 Y, b K Mi) h KMl) E U 2 K EU 2 K) , 

K^K' 

E[T {i)T {i)}=Y J bK,K{l)bK,K{i)[EUi + EV^ - 2E{U 2 K V K )\. 

K 

We construct three score test statistics in the following. We first define 

(io) v7 ™^ 

X o(7 ) = ^°W a „ d Xs(l) _ TsM 



v / Ito(7) V^tsTt) 

where Ito{i)i Its (7) and Itp{i) are the variances of To (7), Is (7) an d 
Tp(7), respectively. However, we need to estimate £ = $) in Xp(7), ^0(7) 
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and -^5(7) for testing and replace £ by its estimator £. Let Uk and Vk denote 
the values of Uk and Vk evaluated at £, respectively, which gives To (7) = 
T,K b K,K(7){U K -VK), T P (i) =J2k^k' h K,K'{l)UKU K i and 1*5(7) = 2>(7) + 
2*0(7). We introduce 



(11) v7i7e ' 



where Ieo{i), Ies(i) and Iep(i) are the asymptotic variances of To (7), 
25(7) and Tp(-y), respectively, with £ evaluated at £. Assume that 

(12) iV 1/2 (e - £*) = iV- 1/2 £ F K + o p (l), 

if 

where is the true value of £ and F# is a random function of (y^-jX^z^-). 
In addition, Fk and T*^-/ are independent of each other for K 7^ K' . With 
these preparations, we can show that Tb(7) = J2k^k,k{i)(Uk — Vk) ~ 
HK:{bK,K{l){U K — Vk) — Jn{i) T Fk} under some mild conditions, where 
J N ( 7 ) = E[-N~ 1 d i T ('y)}. Furthermore, I E o(l) 

can be approximated by 

/to (7) - 2 E AiK7)£[(^£ - 

Because of the one-sided constraint op > 0, we consider Xp{^)l{Xp{^) > 0), 
Xo (7) 1(^0(7) > 0) and X s (7) 1(^5(7) > 0), where 1(A) is the indicator 
function of the event A. Furthermore, to remove the unknown 7, we in- 
troduce the maximum statistics defined by So = sup 7 {Ao(7) 2 1(^0(7) > 
0)}, S P = sup 7 {Ap(7) 2 l(Ap( 7 ) > 0)} and S s = su Pl {Xs(l) 2 l(X s (l) > 
0)}. In practice, the null hypothesis is rejected if any of these three statistics 
{So, Sp, Ss} has a large absolute value. 

As a common practice, the foregoing use of the maximum of the score test 
statistics is based on power considerations (see, e.g., [14]). Because 7 is iden- 
tifiable under the alternative hypothesis only, the maximization over 7 takes 
effect under the alternative hypothesis, as for the likelihood ratio test (LRT). 
We show in [38] that Ss yields an efficient test statistic because it recovers 
information from the likelihood under the alternative hypothesis. Further- 
more, we show that the score test statistic proposed here is asymptotically 
equivalent to the LRT for testing the homogeneity of random effects; see 
Theorems S.l and S.2 in [38]. 

By now we have defined three score statistics for testing homogeneity un- 
der mixed effects models, but we will discuss their asymptotic null distribu- 
tions in Section 3. Similar to Lin's [22] method, an important feature of our 
score statistics is that we only need to specify the first and second moments 
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of the latent variables in (8) for the distribution function F(h\, . . . ,b n ;7). 
Thus, the test statistics are expected to be robust with respect to the distri- 
bution of the random effects. In addition, our test statistics allow a general 
covariance structure of the latent variables, and fij(-,-) may depend on 
unknown parameters. 

As we know, the optimality of a test depends on its power. To compare 
the power of Sq,Sp and Ss with that of Lin [22], we consider sequences 
of local alternatives to ot = 0. The asymptotic local power for Ss follows 
from Theorem 2 in [14]; see Theorems S.1-S.4 in [38] for details. Empirically, 
simulations in Section 4 will demonstrate that the score statistic Ss proposed 
here is more powerful than the score statistic proposed by Lin [22] (see Tables 
1 and 2). 

2.2. A resampling procedure. To assess the power of the three test statis- 
tics {Ss, Sp, So} , we need to obtain empirical distributions for the score 
statistics in lieu of their theoretical distributions. What follow are the four 
key steps in generating the stochastic processes that have the same asymp- 
totic distributions as the test statistics. 

Step 1. We generate i.i.d. random samples, {vP K , : K, K' = (i,j),j = 
1, . . . , rrii, i = 1, . . . ,n}, from iV(0, 1). Here, the superscript (r) represents 
a replication number. 

Step 2. We calculate f^ r) ( 7 ) = f P (7) + f Q r) (7) and 



K,K'i 



(13) 



K^K' 

~X/ ,(r) 

K 

where Fx is an estimator of Fk evaluated at £. Then, we can calculate 



T { o\l) = Y. v K,K{bKMl){U 2 K - Vk) - J N {l) T F K }, 



4'(7) 



y/lEsin) ' 



4 r) (7) = 4=^ - d 4 r) (7)- f(?)(7) 



" (r) 

It is important to note that conditional on the observed data, Xg (7), 

Xp\j) and Xq (7) converge weakly to the three Gaussian processes de- 
scribed in Theorem 2 as N — > 00 (see Section 3). This can be shown using 
the conditional functional central limit theorem; see Section 3 for details. 
Step 3. We calculate the three test statistics 

SP = sup{xW( 7 ) 2 l(4 r) (7) > 0)}, SP = sup{Af >( 7 ) 2 l(4 r) (7) > 0)} 
7er 7 er 
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and 



S, 



(r) _ 



S up{4 r) ( 7 ) 2 l(4 r) ( 7 )>0)}. 



o — 



7er 



Step 4. We repeat the above three steps tq times and obtain three realiza- 



tions: {Sy :r= l,...,r }, {Sy :g= l,...,r } and {Sft :r = l,...,r }. It 



can be shown that the empirical distribution of 5^ converges to the asymp- 

(r) (r) 

totic distribution of Ss ■ Similarly, S Q and S p converge to the asymptotic 
distributions of So and Sp, respectively. Therefore, the empirical distribu- 
tions of these three realizations form the basis for calculating the significance 
level and power of the tests. 

2.3. Example 2 (Continued). Let us revisit Example 2 to illustrate how 
our test statistics can be applied. Using the parameterization in Section 1, 
we see that £(7) = cttW^), and Wj.i'(7) equals TV (7) for i = i' and is 
zero otherwise. In this case, u« = a T ^ 2 hi, rjij = z^-Uj, /ij(%j,7(i)) = z i,j 
and tij = cr^ViJ = z I,j^i- Moreover, we define Oij{tij) = A:(x^/3 + Uj) and 
[iij(tij) = g(-x[j/3 + tjj) to emphasize the fact that they depend on tij ex- 
plicitly. After some calculations, for model (2) we have Uij = 4>eijk(-xjj[3 + 
k,j)\u,j=o and 



where the dots denote differentiation, 6ij(0) = k(~xjjf3 + Uj)\t i J= o and ejj = 
Vi,j ~ Mi,i(0) = yij - /',,,(/,,,) /..; ()• In addition, b K>IC {i) = zfjW^z^ for 



Under the null hypothesis Hq, we use the first four central moments of 
Uij of the exponential-family distributions [19, 33] to get 




i=ij=i 



0fi(M°))*&. 

^•{30 2 a(^(O)) 2 + <M (4) (M0))l, 
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Thus, we can have Its (7) = Ito{i) + Itp(t) and 

n rrii 

Itp( 7 ) = 2]T E {/[ ; IIVJ Z ,,fo 2 ;i(0, J ((l)]«ifl, / (ll)]tyf ) , 

n mi 

ito(7) = E eK^k^^m- + - mut 3 v itJ )\. 
i=ii=i 

As discussed above, we need to replace /3 and by their estimates under 
Hq. The maximum likelihood estimate, $, of (3 satisfies 

{n mi ~| _ 1 n trij 

E E 'HW,,(()))^ ; x/,x,. ; E E i>>J<, + o p (N^ 2 ), 
i=\j=l ) i=lj=l 

which gives Fk for each K = Moreover, we can calculate that 

n mi 

Mi) = N~ l E E *ljW{i)*iMWiM)kih3 + a (3) (M°))^iKi' 
i=i j=i 

and I EO ( 7 ) =I TO (7) - /iv(7) T {E?=i E™=i a(M°))^Xi x y ^(7), m 
which /? is replaced by /?. The strategy to deal with the unknown <fi is similar. 

3. Asymptotic null distribution of score test statistics. In this section, 
we study the asymptotic properties of {Xp(*y), Xp(i), Xp^(7)} under the 
null hypothesis Hq. Note that the asymptotic distributions of Xod), -^0(7) 
and Xq ^ (7) have been widely discussed in the literature [25] . The asymptotic 

distribution of Xg\j) follows from that of Xp\j) and Xq\^). We refer 
to [38] for details on how to apply our asymptotic results in some specific 
examples. 

3.1. Asymptotic null distribution. We denote =>• for weak convergence of 
a sequence of stochastic processes indexed by 7 G T, where the parametric 
space r is a uniformly bounded convex compact subset of R q ' 2 . In addition, 
the uniform metric is used to define the weak convergence. Moreover, for 
a metric space {V,d}, we consider BLi(V) to be the space of real- valued 
functions on T> with Lipschitz norm bounded by 1, that is, for any h € 
sup x g£) |/t(a;)| < 1 and |/i(a?) — h(y) | < d(x,y). As discussed in [29], 

(r) 

as N — > 00, a stochastic process, X p , weakly converges to Gp on V if and 

only if sup heBLl{D) \Eh(xP) - Eh{G P )\ -> 0. 

We have the following theorems, but we defer the proofs of all theorems 
as well as the assumptions to the Appendix. 

Because Xod) can be regarded as the sum of independent but not identi- 
cally distributed random variables indexed by 7, the asymptotic distribution 
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of Xoi'j) is a Gaussian process under some mild conditions by directly ap- 
plying the functional central limit theorem (FCLT) [25, 29]. Furthermore, 
after examining the expressions for Xp(jy) and ^5(7), we find that both 
of them are random quadratic forms indexed by 7. Although the asymp- 
totic properties of random quadratic forms have been extensively studied 
in the literature (e.g., [12, 24]), those results are not applicable to Xs("f) 
and Xp(7), because these stochastic processes are indexed by 7. Thus, an 
invariance principle for the quadratic form process indexed by 7 needs to be 
developed; see detailed discussion in Section 3.2. 

Theorem 1. Under conditions (Al)-(A5) in the Appendix and the null 
hypothesis Hq, Xp(-), Xo(-) and Xg{-) converge weakly to centered Gaussian 
processes as N — > 00. 

Theorem 1 characterizes the asymptotic null distributions of the stochas- 
tic processes of interest and forms the foundation for constructing test statis- 
tics. 

Let us understand the asymptotic properties of Xoij), Xp(j) and Xg(7). 
Because Xo(j) is also asymptotically equivalent to the sum of independent 
random variables, it converges to a Gaussian process under some mild condi- 
tions (see Theorem 10.6 of [25]). For Xp{^), under suitable conditions we can 

show that Xp(7) = J2k^K' bK,K'{l)UKU K ' / V 'Iep{i) = ^p(t) and 
Iep^) = Itp{i) + Op(l). Thus, Xp(j) and Xp^) are asymptotically equiv- 
alent. The asymptotic distribution of As (7) can be established by noting 
that it is a weighted sum of Xo("f) and Xp(-y). To summarize our discus- 
sions, we have the following theorem. 

Theorem 2. Under conditions (B1)-(B8) in the Appendix and the null 
hypothesis Hq, as N — > 00, X P (-) =4> Gp(-), X {-) => Go(-) and X s {-) => 
Gs(-), where Gp, Go and Gs are three centered Gaussian processes. 

Theorem 2 delineates the asymptotic distributions of Xp^), Xoi'f) and 
Xs("f)- In the generalized linear models, Xoi'f) is the same as several tests 
for overdispersion [9]. In an example of a Bernoulli response variable, Jacqmin- 
Gadda and Commenges [17] show that ^5(7) is identical to the pairwise 
correlation term. 

To derive asymptotic null distributions of So, Sp and Ss, we apply the 
continuous mapping theorem and have the following corollary. 

Corollary. Under the assumptions of Theorem 2, So —> sup 7 {Go(7) 2 x 
l(Go(7) > 0)}, S P 4 sup 7 {Gp(7) 2 l(G P (7) > 0)} and S s 4 sup 7 {G 5 (7) 2 x 
1(^5(7) > 0)}, where — > represents convergence in distribution as N — > 00. 
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3.2. Asymptotic distribution of a random quadratic form. As noted above, 
to understand the asymptotic null distribution of {Xp(-y), Xp(j), Xp' (7)}, 
we need to investigate the asymptotic properties of the random quadratic 
forms indexed by 7 G T. For convenience, we will also use K to index the 
integers from 1 to N as well as the pairs because there is a one- 

to-one correspondence between {K = (i,j):j = 1, . . . , rrii, i = 1, . . . , n} and 
{K = 1, . . . , iV}. Consider the quadratic form without diagonal terms 

(14) X P (7)= c k,k>{i)xkxk>, 

where x\, . . . ,xn are a sequence of independent random variables such that 
Exk = and Ex 2 K = 1 for all K = 1, . . . ,iV. Note that the ck,k'{iYs may 
depend on N. We establish the asymptotic distribution of Xp{^f) as follows. 



Theorem 3. Under assumptions (C1)-(C4) in the Appendix, Xp 
Gp, where Gp is a centered Gaussian process with covariance matrix /9(i)(7 ,7') , 
as N — > 00 . 

Theorem 3 establishes an invariance principle for a random quadratic 
form indexed by 7; however, generalizing this result to a random quadratic 
form indexed by an arbitrary index set warrants further investigation. To 
simulate the asymptotic distribution of Xp("f), we consider the quadratic 
form 

(15) X [ p\-i) = V2 ^2 c KtK >(<y)x K x Kl vP K ,, 

where {vP K , : K, K' = 1, . . . , N} is a sequence of random variables defined 
in Step 1 above. Let Ey denote the expectation taken with respect to all 

(r) 

V K K' conditional on the data. 

Theorem 4. Assume that (C2)-(C4) in the Appendix are true and (CI) 
holds forp>4. Then X { p ] (•) converges weakly to the same Gaussian process 
Gp(-) as N^co; that is, Xp^ is asymptotically measurable. In particular, 

(r) 

as N — > 00, sup/j gSil (£oo( r )) \Eyh(X p ) — Eh(Gp)\ — > 0, in probability. 

(r) 

One important feature of Theorem 4 is that we can use X y p > to ap- 
proximate the Gaussian process Gp. This theorem generalizes the resam- 
pling technique from the independent but nonidentically distributed frame- 
work [20] to the more general random quadratic setting. In particular, we 
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propose a practical resampling technique to simulate the asymptotic distri- 
bution of Xp( 7 ). 

To consider the process Xp( 7 ), we introduce a sequence of indepen- 
dent random functions {Ui(si, £),... ,Un(sn,£)} such that Uk(sk,£,*) = 
x K . Furthermore, let Xp(7,£-£*) = J2k^k' c k,K'(i)Uk{sk,O u K'{sk',0- 
We can easily see that Xp(j,0) = Xp(j) and Xp(j) = Xp(7,£ — £*). In the 
following, we will prove that if £ = £* + O p (N~ 1 ^ 2 ), then as TV — ► oo, 

(16) Xp(j) = Z P ( 7) £-£*) = X P ( 7 ) + op(l). 

Thus, the asymptotic distribution of Xp( 7 ) is the same as that of Xp(j) as 
described in Theorem 3. The asymptotic distribution of Xp( 7 ) in Theorem 2 
follows directly from (16). A sufficient condition for (16) is that 

(17) sup \Xp( 1 ,hN- 1 / 2 )-X P (j,0)\=o p (l) 
7er,||h|| 2 <M 

holds for any given M > 0, where || ■ H2 is the Euclidean norm. The following 
theorem validates this sufficiency condition. 

Theorem 5. Under assumptions (C1)-(C8) in the Appendix and £ = 
£* + O p {N~ 1 / 2 ) , Xp(j) is asymptotically equivalent to Xp(j) as N — > 00. 
In particular, (17) is true. 

Theorem 5 first gives the exact conditions to guarantee the asymptotic 
equivalence between Xp( 7 ) and Xp( 7 ). Similarly, we can use Xp^( 7 ,£ — 

£0 = \^Y,k^k> c k,k>{i)v% ] k ,Uk{sk,£,)Uki{sk>,€) to approximate the 
asymptotic distribution of Xp( 7 ). In particular, Xp^( 7 ,£ — £*) has the same 
form as Xp (7) in Step 3 of Section 2.3. By using similar arguments to those 

~ (r) (r) 

in Theorem 5, we can prove that X p (7) = Xp (7) + o p (l). As shown in 

(r) 

Theorem 4, Xp converges to the process Gp in distribution conditional on 

the data. Combining Theorems 4 and 5, we can conclude that Xp^( 7 ) has 
desired properties, which leads to the following corollary. 

Corollary. Under assumptions (C1)-(C8) ; Xp^( 7 ) is asymptotically 
equivalent to Xp\j) and Xp"* =^Gp conditional on the data. 

4. Simulation study and a real example. There are two computational 
issues related to our test procedures in Section 2. First, we need to replace £ 
by £. In the following, we choose £ to be the maximum likelihood estimate 
obtained from the Newton-Raphson algorithm under the null hypothesis. 
Second, the computation for generating the three realizations, as required 
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in Step 4 of the resampling process, is intensive. For instance, to gener- 
ate {Sg :r = 1, . . . ,ro}, each entails a maximization process because 
Sg = sup 7gr {X^(7) 2 l(X > g*' ) (7) > 0)}. To ease the computational burden, 
we approximate T by a grid Ta- 

4.1. A simulation study. In this section, we use simulations to compare 
the performance of Sp, So and Ss, and the test of Lin [22], denoted as LS. 

The simulated data sets were drawn from two generalized linear mixed 
models: the logistic mixed model and the linear mixed model. We assume 
that the logistic mixed model has the form 

(18) logit P(y itj = l|bi) = 1.0 + 0.8 XiJ1 + 0.5x iJ2 + + z itj ib ifi ). 
The linear mixed model takes the form 

(19) yij = 1.0 + Xiji + Xij 2 + + Zijlkfl) + 

where £ij and the random effects bj = (bi i,bi 2) T are independent, and Ei j 
follows a normal distribution with mean zero and variance eft. The Xi ji, Xij2 
and Ziji were simulated from a standard normal generator. The random ef- 
fects bj were simulated from a bivariate normal distribution with mean (0, 0) 
and the 2x2 covariance matrix a\ (1, p\ \\pi, p%)- Using the parameterization 
(5), T is given by (0,7r] x [— <5o,<5oL where Sq is any scalar in [0, 1). We used 
the grid T A = {(in/20, j/16) : i = 1, . . . , 20; j = -15,..., 15}. The size of the 
grid was based on computational feasibility and our empirical observation 
that it appears large enough to perform well. In the resampling procedure, 
r was set to be 1,000. 

For all score test statistics, we first compare the type I error under the null 
hypothesis and the power under the alternative hypotheses. For the logistic 
mixed model (18), we generated observations from the Bernoulli distribution 
B(l, P(uij = l|bj)). We considered n = 30 and 50. Every unit contains 5 
subjects (rrii = 5). We used correlations p\ = 0.5 and p2 = 1.0, and several 
different values of a±. For the linear mixed model (19), we demonstrate the 
gain of power by considering the correlation structure among the random 
effects. The simulated data set contains 40 (n) 4-subject units. To generate 
the random effects, we chose seven different values of a\ and two sets of 
(pi,p 2 ) given by (0.5,1) and (-0.3,0.2). 

The results based on 10,000 replications are reported in Table 1. As ex- 
pected, a larger sample size improves the power of detecting heterogeneity. 
The rejection rate under the null hypothesis is close to the nominal level 
of 0.05 for the score test statistics. Under the alternative hypothesis, Ss 
is slightly and consistently more powerful than Sp. This is because Ss ac- 
counts for the overdispersion due to the latent variables, which is tested by 
So- Table 1 also suggests that power is improved for all test statistics when 
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a general correlation structure is assumed. It is quite remarkable that, even 
with relatively small sample sizes (30-50), the increment of power is still 
evident. Not surprisingly, the loss of power is more severe by ignoring the 
correlation when it is actually high. We should clarify that we used a general 
correlation structure to obtain the results in Table 1, instead of the under- 
lying correlation. Also important, in all cases Ss is more powerful than LS, 
and the difference is sometimes substantial. This observation is consistent 
with the fact that the likelihood ratio statistic under the constrained alter- 
native is uniformly more powerful than that for the unconstrained case and 
that there is an asymptotic equivalence between the likelihood ratio statis- 
tic and the score statistic under both the constrained and unconstrained 
alternatives [27, 28, 38]. 

Table 1 

Comparison of type I error and the power for the score test statistics under models (18) 
and (19) at significance level 0.05. " Considered" and "ignored" represent including or 
excluding the correlation in the score test statistics 





LS 


So 


Sp 


Ss 


LS 


So 


Sp 


Ss 






logistic mixed model (18) and Bernoulli distribution 








n = 30 (mi = 5) 






7i = 50 (m 


< = 5) 




0.0 


0.049 


0.048 


0.054 


0.060 


0.052 


0.056 


0.048 


0.054 


0.3 


0.216 


0.060 


0.282 


0.292 


0.312 


0.076 


0.426 


0.446 


0.6 


0.484 


0.064 


0.600 


0.608 


0.681 


0.061 


0.760 


0.780 


0.8 


0.626 


0.060 


0.734 


0.734 


0.840 


0.070 


0.908 


0.906 


1.2 


0.828 


0.050 


0.884 


0.886 


0.969 


0.092 


0.972 


0.972 










linear mixed model (19) 
















E = a?(l,0.5||0.5,l) 
















n = 40 


(mi = 4) 












k 


;nored 






considered 




0.00 


0.047 


0.022 


0.054 


0.053 


0.058 


0.030 


0.051 


0.055 


0.05 


0.186 


0.063 


0.203 


0.233 


0.232 


0.054 


0.213 


0.265 


0.10 


0.433 


0.116 


0.389 


0.477 


0.483 


0.106 


0.414 


0.538 


0.15 


0.639 


0.172 


0.609 


0.700 


0.689 


0.158 


0.632 


0.719 


0.20 


0.788 


0.225 


0.757 


0.819 


0.812 


0.233 


0.771 


0.845 


0.25 


0.873 


0.282 


0.845 


0.894 


0.898 


0.284 


0.847 


0.915 


0.30 


0.923 


0.332 


0.902 


0.940 

E = tr?(l,- 
n = 40 


0.944 
0.3||-0.3,0.2) 
(mi = 4) 


0.341 


0.905 


0.952 






k 


;nored 






considered 




0.05 


0.113 


0.026 


0.140 


0.153 


0.121 


0.040 


0.146 


0.165 


0.10 


0.217 


0.031 


0.289 


0.309 


0.267 


0.039 


0.313 


0.332 


0.15 


0.397 


0.038 


0.475 


0.496 


0.452 


0.058 


0.479 


0.518 


0.20 


0.586 


0.043 


0.638 


0.646 


0.606 


0.081 


0.639 


0.675 


0.25 


0.605 


0.051 


0.751 


0.770 


0.736 


0.094 


0.762 


0.795 


0.30 


0.797 


0.053 


0.848 


0.849 


0.828 


0.114 


0.853 


0.880 
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We now examine the type I error of all score tests in the case of nuisance 
overdispersion. For model (18), the binomial distribution, -6(5, P(yi.j = l|b>j)) , 
was included in this study to introduce large overdispersion. We perturbed 
model (18) with random intercepts, which are independent of each other 
and subject specific. Specifically, we added crf^ij' to the constant intercept 1 
in model (18), where v^j was generated from N(0, 1). For model (18), we 
simulated random errors Eij from iV(0, ^exp^^er^)). We set o\ = and 
chose several different values of a<i. 

The results based on several different values of a 2 and 10,000 replications 
are reported in Table 2. For binary data, the type I error is reasonably con- 
trolled for the four statistics, even when a"2 is large. In contrast, under the 
binomial distribution, random intercepts lead to discrepancies between the 
significance levels and the nominal level for Ss, So and LS, while the perfor- 
mance of Sp remains reasonable. For Ss and So, the discrepancy is greater 
for a larger This is because So is suitable for testing overdispersion. It 
is possible that Ss and So yield large p-values whereas Sp gives a small 
p-value. In this situation, these p-values suggest the presence of overdis- 
persion instead of heterogeneity of the random effects. In addition, for the 
binomial distribution, So is much more powerful than LS in detecting the 
nuisance overdispersion. For the linear mixed model, the heterogenous vari- 
ance leads to inflated type I error of LS, while the performance of Ss, Sp 
and So remains stable. 

4.2. Yale family study of comorbidity of alcoholism and anxiety ( YFS- 
CAA). The YFSCAA was conducted to examine the patterns of familial 
aggregation of alcoholism in the relatives of 115 probands with alcohol de- 
pendence compared to those of 147 psychiatric (80 probands with anxiety 
disorders) and normal controls (67 probands with no history of psychiatric 
disorders). The total sample used for the familial aggregation analyses in- 
cluded 222 probands who had 1194 adult first-degree relatives and spouses. 
We refer to [23] for a detailed description of the study design and data 
collection. Recently, Zhang, Feng and Zhu [34] developed a latent variable 
model as described in Example 1 and a two-step procedure for assessing 
familial aggregation and heritability of disease, based on the assumption 
that the elements of Vj follow a Bernoulli distribution. The importance of 
our reanalysis is to demonstrate how to use our new results to remove the 
restrictive Bernoulli assumption on Vj. 

As in Section 2.4 of [34], for any we have 

vsx(bi.j) =«2 + 7i(l - 7i)[(l ~ 7i) a 3 +7i(«3 + a 4 ) 2 + (03 + 7ia 4 ) 2 ]. 

Similarly, we can get cov(6jj, 6^^) for all j, k = 1, . . . , wij. Let op = var(6jj). 
Then we have var(bj) = S» j = opWi^i^j) for the ith family. For example, for 
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Table 2 

Comparison of the type I error in the presence of nuisance overdispersion for the score 
test statistics under the logistic and linear mixed models at significance level 0.05. 
"Considered" and "ignored" represent including or excluding the correlation in the score 

test statistics 





LS 


So 




Sp 


Ss 


LS 


So 


Sp 


Ss 






logistic mixed model (18) and Bernoulli distribution 








n = 30 (m 


i = 5) 






n = 50 (m 


4 = 5) 




0.3 


0.055 


0.054 




0.050 


0.054 


0.051 


0.056 


0.040 


0.040 


0.6 


0.048 


0.054 




0.056 


0.066 


0.048 


0.054 


0.048 


0.046 


0.9 


0.055 


0.056 




0.061 


0.074 


0.042 


0.046 


0.066 


0.048 


1.2 


0.058 


0.058 




0.070 


0.068 


0.045 


0.060 


0.061 


0.060 






lot 


;istic mixec 


model (18) 


and binomial distribution 








n = 30 (m 


i = 5) 






n = 50 (m 


i = 5) 




0.2 


0.131 


0.195 




0.054 


0.105 


0.163 


0.346 


0.055 


0.154 


0.4 


0.265 


0.522 




0.052 


0.188 


0.364 


0.784 


0.062 


0.298 


0.6 


0.420 


0.792 




0.061 


0.276 


0.568 


0.964 


0.062 


0.435 


0.8 


0.558 


0.942 




0.063 


0.374 


0.734 


0.998 


0.073 


0.568 


1.0 


0.670 


0.987 




0.061 


0.444 


0.863 


1.000 


0.068 


0.673 












linear mixed model (19) 


















n = 40 ( 


m< = 4) 












k 


nored 






considered 




0.25 


0.084 


0.010 




0.048 


0.039 


0.188 


0.011 


0.042 


0.031 


0.50 


0.119 


0.010 




0.040 


0.041 


0.199 


0.029 


0.043 


0.045 


0.75 


0.170 


0.009 




0.041 


0.034 


0.272 


0.023 


0.043 


0.031 


1.00 


0.212 


0.005 




0.039 


0.030 


0.317 


0.016 


0.037 


0.031 



a nuclear family with two siblings, we can show that 



var(bj) =Si,i = a T 



( 1 


Pi 


Po 


Po\ 


P2 


1 


Po 


Po 


Po 


Po 


1 


pi 


\Po 


Po 


pi 


1/ 



where pi = afe/ar, Po = [«| + 7l(l — 7i)(«3 + 7ia4) 2 ]/cT and pi = [a| + 
7i(l-7i)(a3 + 7i"4) 2 + 0.257^(l-7i) 2 a|]/a T . Let a 2 /S a = cos(j 2 ), a-s /Sg = 

sin(72) cos(73) and a.i/S a = sin(72) sin(73), where S a = \J a\ + + a\ - Then 
po j Pi and pi can be written as functions of 71 , 72 and 73 , which are nuisance 
parameters here. It is noteworthy that deriving these correlation parameters 
is relatively straightforward for a general pedigree. 

The score test statistics presented in Section 2 can be used to detect the 
familial correlation that includes both environmental and genetic factors 
through testing the hypothesis ot = 0. Under the null hypothesis, the max- 
imum likelihood estimate is (1.3341,-0.4181,0.0178,-1.6501,-1.1522). To 
compute the maximum score test statistics, we used ro = 10,000 and ap- 
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proximated the nuisance parameter domain T= [0.01,0.99] x [— 7r/2,7r/2] x 
[-7r/2,7r/2] by a 15 x 15 x 15 grid, T A = {(i/16,jir/U, kn/U):i = 1, . . . , 15; 
j, k = —7, . . . , 7}. Accordingly, So, Sp and Ss equal 1.23, 64.22 and 63.91, re- 
spectively. The p- value for So is 0.158, revealing no evidence for the overdis- 
persion. The p-values for Sp and Ss are less than 0.0001, providing signif- 
icant support for familial aggregation and inherit ability of alcoholism. To 
ensure that the size of the approximating grid does not affect our analy- 
sis, we considered a series of grids from smallest size 2 x 2 x 2 to largest 
size 100 x 100 x 100. The differences in the approximated values for So, Sp 
and Ss are indeed so small that they had no impact on our analysis. 

5. Discussion. We have proposed several score statistics to test homo- 
geneity and overdispersion in the mixed effects and latent variable mod- 
els. The major advantage of these statistics is that they do not depend on 
the distribution of the random effects except for their mean and variance. 
Simulation studies demonstrate that both Sp and Ss have great power in 
detecting the heterogeneity in latent variables, but the type I error of Ss 
is inflated in the case of nuisance overdispersion (Table 2). We have also 
examined a number of simulated data sets and one real application to high- 
light the broad spectrum of the applications for which our test procedures 
can be used. Another advantage of these statistics is that they automati- 
cally impose the positive semidefinite constraint on the variance components 
of Ej i j/(7)'s. For the model in Example 2, the statistic Ss reduces to the pro- 
jection score test statistic [16], which asymptotically follows a mixture of x 2 
distributions under a Hq [26]. The simulation studies in Section 4 suggest 
that using a constrained score test can substantially increase the power of 
detecting heterogeneity. See [38] for detailed discussion. 

The score statistics proposed here are to test whether all the variance 
components are zero. When the null hypothesis is rejected, it is also of 
interest to test whether some of the variance components are zero. The 
advantage of testing the overall hypothesis on all the variance components, 
followed by identifying some nonzero components, is control of the type I 
error. Although we do not discuss how to identify the nonzero components, 
our results can be useful for this purpose. For instance, Ss can be directly 
applied to clustered designs [22] as n — ► oo and when all m^'s are bounded 
by a constant, and the asymptotic distribution of Ss can also be derived 
under an M-dependent sequence in nested models by using the functional 
central limit theorem for dependent data [13]. In particular, we can follow the 
derivation in Section 2 to develop a score test by using the parameterization 
given in Section 1, and use a parametric bootstrap (or resampling procedure) 
to approximate the p- value. However, it is beyond the scope of this work to 
address all these related issues in detail. 
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Many issues still merit further research. One major issue is the empirical 
performance of the test statistics in finite samples under different situations, 
such as proportional hazard models with random effects [30] and genetic 
linkage test. Other possible applications include tests of spatial homogeneity 
for spatial processes, tests of serial correlation for state space models and 
tests of the Markov (or semi-Markov) hypothesis [8]. In addition, our result 
can be used to address important practical problems such as the selection of 
the random effects components in a generalized linear mixed model [7, 11]. 
Our results combined with those in [10] may also be useful in mixed effects 
models for semiparametric regression. It is noteworthy that our assumptions 
in this paper are not optimal. Some extensions are still possible and warrant 
future research. Another major issue is to further assess the impact of the 
grid dimension on the quality of approximation, even though some empirical 
evidence suggests that a rough grid works very well. Also see [1] and [36]. 

APPENDIX: PROOFS AND TECHNICAL DETAILS 
We introduce some notation as follows: 

EUl 



<EU 2 K EU 2 K , 



VK 



m - v K ) 



^&x{U 2 K - V K ) ' 



Var(E/£ - V K ) 



Ito{i) 



Thus, X P (-j) = J2k^k> c k,k>(i)xkxk>, -Xo(t) = Y.K d K{l)yK and 

rp / \ 

X s(l) = ^==r^ = X N(l){l) c K,K'{l)x K X K ' + X N(2){l)J2 d Kh)yK- 

V1ts{J) k ^ k , k 

A.l. Regularity conditions of Theorem 1. 

(Al) As N — > oo, Xp{^f) converges to a Gaussian process with mean 
zero and covariance matrix /0(i)(7,7') and liniTv^oo sup 7er |/^ m ax[C(7)]| = 0, 
where // m ax[C(7)] is the largest absolute eigenvalue of C{^f) = {ck,k'{i))- 
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(A2) lim A r„» 00 sup K sup 7gr |dx(7)| =0. The dxilYs and CK^K'il)^ have 
first-order derivatives with respect to 7 and for any {74, t = 1, . . . ,92}) 
EifSup 7gr [<9 7t dR-(7)] 2 < 00 and E^^' sup 7er [9 7t c^/(7)] 2 < 00. 

(A3) The sequence {sup 7gr |^(7)||y^| :K = 1,... ,N} satisfies the Lin- 
deberg condition. 

(A4) For any 7 and 7' in T, lim^oo Pjv(2) (7, 7') = Y,K d K (t^hKV) = 
P(2)(7,Y)- 

(A5) liniAr^oo sup 7er |Ajv(i)(t) — Ai(^) | =0 and Ai(7) is continuous in 7. 

Proof of Theorem 1. In terms of J2k dx^VK, we can directly apply 
the Jain-Marcus theorem [29] by using assumptions (A2)-(A4). The finite 
convergence of ^5(7) can be observed from Theorem 5 of [15] by using 
assumptions (Al)-(A3). To prove the asymptotic equicontinuity of Xs(j), 
we note that for any 7 and 7' in V, 1X5(7) — -^sCtOI i s bounded by 

|A JV(1) ( 7 )Xp( 7 ) - A JV(1) ( 7 / )Xp( 7 / )| + \X m (l)Xo{l) - *N(20)X o (l% 

The first term |Ajv(i)(7)-X~p(7) — X^^( r y')Xp('y')\ is bounded by 

|A 7V (i)(7)-Av(i)(7 / )II^p(7)I + |A7v(i)(7 / )II^p(7)-^p(7 / )I- 

From (A5), it follows that | Ajv(i) (7) — ^N(i)(l')\ can be sufficiently small 
when 7 and 7' are sufficiently close. Using the fact that Xpi^y) = O p (l), 
Ajv(1)(7') < 1 and ATp(7) is stochastically continuous, we can prove that for 
any e, rj > 0, there exists a 5 > such that 

lim Pi sup \X N{1) {-f)Xp(j) - X N{1) (-f')X P (-f')\> e\ <i]. 

N^oo L||7-y||<<5 J 

By using similar arguments, we can handle the second term. Therefore, 
^5(7) is stochastically continuous. This completes the proof of Theorem 1. 

□ 

Remark 1. Assumption (Al) will be established in Theorem 3 where 
we introduce sufficient conditions (C1)-(C4). Assumptions (A2)-(A4) can 
be replaced by the assumptions of Theorem 10.6 of [25], but for simplicity, 
we prefer (A2)-(A4) because they can be easily checked for all examples 
considered here as well as those in [38]. 

Note that Uk depends on implicitly. So, we denote it by [/^(bJV -1 / 2 ) 
when we replace £* in Uk by £* + hiV" 1 / 2 . We introduce similar notation 
for Vk, {Ito(i)iIts{i)Jtp{i)} and {Ieo{i)Jes{i),Iep{i)}- After re- 
placing by + hiV" 1 / 2 in Tpfr), T s (j) and To (7), we get Tp(7,h), 



20 



H. ZHU AND H. ZHANG 



Ts(7,h) and T (7,h). We define X N (3)(l) = V 'Iep{i) I 'Ies (j), >>N(i)(j) = 
VIeo(j)/Ies(j), VK(h) = [U K (hN~ 1 / 2 ) 2 - V K (hN~ l / 2 )]/J\ a r(U 2 K -V K ), 



x K (h) = == — and d ,K{V = b K ,K{V\ 



EUl 



With these preparations, we get 



Ieo(j) 



A>(7,h)= ^2 c KiK >{i)x K {h)xKi{K)\ 



Iep{i) 



I EP (7,h7V-i/2) 



and 



X s ( 7 ,h) = [Ajv (3) (7)Xp(7,h) +Ajv( 4 ) (7)^0(7, h)]i 



The following conditions are assumed for Theorem 2. 



Ies(i) 



/ ES (7,hAf- 1 /2) 



A.2. Regularity conditions of Theorem 2. 

(Bl) limjv^ooSup 76rj || h || 2 < M ||/ i3 o(7)[^o(7,h^~ 1/2 )]" 1 - 1|| =0 and 

Iep{i) 



lim 

iV— >oo 



sup 

7Gr,||h|| 2 <M 



1 



0. 



(B2) For any ||h|| 2 < M, snp K Vav[y K (h) - y K (0) - d h y K (0) T h] ^ as 
N — > 00 and |yA'(h) — yx(h')| < ZA'||h — h'||2, where sup A -S(z A -) < 00. In 



:3C 



addition, sup K E[y K (0) 2 + 11^^(0) ||^] < 00, sup 7gr E£=i ^0,^(7)] < 
and EA=i su P 7 6r[^7t d o,A(7)] 2 < °°. 

(B3) iV 1/2 (|-C*) = A r ~ 1/2 EAi ? A' + o P (l), iV" 1/2 EA^ = P (l), and 



lim sup 



A 



y/lEoin) 



J sin) 



(B4) /k- (7) = &a,a(7) / V 'Ieo(i) nas the first-order derivative with re- 
spect to 7, and for any {7*,* = l,...,g 2 }, Ea s^erl^t^A-^)] 2 < 00 ■ We 
assume similar conditions for all components of Jn(0) (7) = Jn (7) / V-^Eoil)- 

(B5) The sequence {sup 7Gr |/a(7)(^a ~ v k) ~ Jn{o)(j) T f k \ :K = 1,... } 
N} satisfies the Lindeberg condition. 
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(B6) For any 7 and 7' in I, limjv^ooPjV(3)(7> V) = P(3)(7>7')> where 
Pjv(3)(7>7') is g iven b Y 

2 Cov[/ / ,( 7 )(f/| - Vk) - J N{ o) (l) T F K Jk(j')(U 2 k - V K ) - J N (o) (YfW 

K 

(B7) For any given M > 0, sup 7er || h || 2 < M | Y,K?K> ck,K'{i)[xk{^)x K '{^) ~ 
xrXk'W = °p(l)> an d -^p(7)0) converges in distribution to a Gaussian pro- 
cess with mean zero and covariance P(i)(7,7 / )- 

(B8) As TV — ► 00, Ajv(3)(7) uniformly converges to A(3)(7) for all 7 6 T, 
and A(3)(7) is continuous in 7. 

Remark 2. Some sufficient conditions for (B3) in a general mixed model 
have been given by Jiang [18]. Some sufficient conditions for (B7) will be 
given in (C1)-(C5), and other conditions will be given in Theorem 5. Also 
see Theorems 3 and 5 for more details. 



Proof of Theorem 2. The proof of Theorem 2 consists of three steps. 
In the first step, we will establish that 

(20) x (rf, h) = £ d , K (rY)[yK(o) + ta(o)h](i + 0p (i)) + o p (i). 

K 

From (Bl), it follows that X (j, h) = J2k ^o,A'(7)y^(h)(l + o p (l)). Fur- 
thermore, we consider the stochastic process SP(I) = Z^A'^0,A'(7)[yA'(h) — 
Uk{0) — dhVK(0)h] indexed by {(7,h):7 G T, ||h||2 < M}. For each fixed 
(7,h), the variance of SP(I) converges to zero and (B2) leads to the result 
that SP(I) is stochastically continuous. Thus, (20) is proved. We can use 
(B3) to deduce that ^0(7, — £*)) can be approximated by 

/r * Y^kAiWI - V K ) - J N (-/) T F K ] [1 + o p (l)] + o p (l); 

therefore, Xoi'j) converges to a Gaussian process with mean zero and co- 
variance matrix ^03(7, 7') because (B4)-(B6) are sufficient conditions for this 
claim. 

The second step is to show that 

Xp(-y,VN(€-&))= c KiK ,(i)x K x K ,[l + o p (l)]+o p (l). 

This can be proved by using (Bl) and (B7), which ends the second step. As 
the last step, we combine the results on Xp(7,h) and Xo("f,h) and then 
follow the proof of Theorem 1 to complete the proof for Theorem 2. □ 
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A.3. Regularity conditions of Theorem 3. 

(CI) Let x\, . . . ,xn be a sequence of independent random variables such 
that Exk = and Ex\ = 1 for all K = 1,...,N. We assume that 
sup k {E\xk\ p } < oo for some integer p greater than max((/2,2). 

(C2) VarLY P ( 7 )] = 2£^, 

c \ if' (7) — 2 and lim7v-»oo sup 7gr 

|/imax[C( 7 )]| =0. 

(C3) ck,k>{i) has continuous first-order and second-order derivatives with 
respect to 7. Let 7$ be any component of 7. We assume that 
E^if'Sup 7er [5 7t c^^/( 7 )] 2 < 00 for t = l,...,q 2 - 

(C4) For any 7 and 7' in T, 

lim /9jv(i)(7,Y)= J! c K,K'(l)c K ,K'(l f ) = P(l)(l,l')- 

Proof of Theorem 3. First, we need to show that any finite-dimen- 
sional distributions of {Xp(~f) : 7 £ T} converge weakly to the corresponding 
finite-dimensional distributions of {Gp{^y) 17 G T}. From (C1)-(C3) and the 
martingale convergence theorem, it follows that J2k^k' c K,K'{l) x K xr' con- 
verges to the standard normal in distribution for any 7 £ T; see [24] and [15]. 

Let us consider two points 71 and 72 in T. By using the Cramer- Wald 
device, we need to show that for any a\ and a<i in R, 

Xpir/mz) =aiX P (^i) +a 2 lp(7 2 ) iV[0,2af + 2a 2 , + 4aia 2 /9(i) (71,72)] • 

From (C2) and (C4), we know that Var[Xp(7i, 72)] converges to 2a\ + 2a^ + 
4oio 2 /3(i)(7i,72). If a\ + al + 2a 1 a 2 p{x) (71, 72) = 0, then ^(71,72) converges 
to zero in probability and aiGp(7i) + 02(^(72) = 0. In other cases, we have 
a\ + a| + 2aia2P(i)(7i, 72) > 0. From (C2), it follows that 

|/^max 

[aiC(7i)-t-a 2 C(7 2 )]|<[ 

[C(7l)]| + |a2||^max[C(72)]| — ► 0. 

Thus, -Xp(7i,72) converges to the desired normal random variable in distri- 
bution. Similarly, we can generalize this result to any finite cases. 

From Lemma 1.3 of [24], it follows that {E\ J2 k ±k< [°k,k ' (71 ) ~ C K, K '(72)] x 
%K %K'\ P } 2 ^ P < C J2k^K' \ c k, k' (71 ) — Cftr,if' (72 )] 2 > where C is a scalar inde- 
pendent of N. By using (C3), we have {E\ J2k^K'[ c k,K'(h) - c K ,K'{l2)] x 
xr xk'\ p } 1 I p — C||7i — T2 1 1 2 • To prove the stochastic equicontinuity of Xp(j), 
we just need to show that £sup|| 71 _ 72 || 2 <,5 1 T,k^K'[ c k,K'(h) - c K ,K'{l2)] x 
xrXk'\ p — > as 5 — > and N — > 00. We can finish our proof by noting that 
r is a bounded compact set of R Q2 , whose packing number D(t,T, \\ ■ H2) is 
of the order of t~ q2 . Theorem 2.2.4 of [29] concludes the proof. □ 

Proof of Theorem 4. First, we will prove the unconditional weak 
convergence of X [ p ] (7). After some calculations, we can show that E[XP (7)] = 
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and Cov[x}T ) (7i),^p ) (72)] = ^Hk^K' ck,k'{h)ck,k'{12) = 2pjv(i) (71,72) ■ 
Therefore, Xp^(7) and Xp(j) have the same mean and covariance struc- 
tures. Following the proof of Theorem 4 in [15], we can show that Xp^^) 
as the sum of martingale differences converges to a normal random variable 
for any 7 G T, as N — > 00. The Cramer- Wald device is applicable in any 
finite case. Following the similar argument in Theorem 3, we can show the 

(r) (r) 

stochastic continuity of X p (7). Therefore, X p (7) converges to Gp(-y) in 
distribution; that is, Xp^^) is asymptotically measurable. 

Second, given x±,...,xn, we have Xp\^) ^ N[0,2J2ck,k'{i) 2 x'j < x 2 k ,] 
and (7)^(7')] is e q ual to 2 T.k^k' c k,k'{i)c k ,k'{i')x 2 k x 2 ki . We 

write J By[X^ ) (7)xj 3 r) (7 / )]/2 as the sum of T^k^K' c k,k'{i)ck,k'{i'){x 2 k - 
l){x 2 K , - 1), 2Y,k^K> CK,K'{l)c K ,K'{i){x 2 K - 1) and Piv(i)(7,7')- The first 
term is also a random quadratic form. Its mean is zero and its variance is 
bounded by C m&XK{J2K'=i c K,K'{l') 2 } , which converges to zero; 
see Lemma 1.2 of [24]. By using Theorem 1 of [31], we can show that 
YuK+K' c k,k'{i)ck,k'{i'){xk ~ 1)0^1:' ~~ 1) converges to zero in probability. 
The same technique can be used to show that J2k^k' c k,k' (i)ck,k' (7') {x 2 K — 

1) converges to zero in probability. Thus, Ey [Xp^^Xp^^')] — ► pm(7,7 / ) 
in probability. We can obtain the marginal convergence in the conditional 
central limit theorem by using the Cramer- Wald method. 

For each 5 > 0, let T$ assign to each 7 £ T a closest element of a given finite 
(5-net of r with respect to || • [[2. The above finite convergence results lead to 

su PheBLi(^(r)) \ E vK x p\ T &{'))) ~ e K g p( v s{-)))\ -> in probability, as 
N — > 0. By continuity of Gp^j), we have Gpfis^)) — > Gp{^) almost surely, 
as 5 -> 0; that is, lim 5 ^ sup hgBLi( ^ (r) ) \Eh(G P (T s (-))) - Eh(G P (-))\ = 0. 

Finally, sup /ieBil ^oo( r )) \E v h{X { p\r s (-))) - E v h(X { p\-))\ is bounded by 
E'v(sup|| 7 _y|| 2 < ( 5 |Xp^(7') — Xp^(7)|). Because the expectation on the left- 
hand side is smaller than £ , (sup 7)7 / er .|| 7 _y || 2 <,5 \X { ;\ 7 ') -XP ( 7 )|), which 

(r) 

was established by the unconditional weak convergence of X P J (j), the de- 
sired results follow. □ 

Next, we state a few more assumptions. Let UkO^N^ 1 / 2 ) = Uk(sr, hiV -1 / 2 ) — 
fiKihN- 1 / 2 ), where ^(hiV" 1 ^) = EU K (s K , hN" 1 / 2 ). 

A.4. Regularity conditions of Theorem 5. 

(C5) Uk(sr,0 has continuous first-order and second-order derivatives 
with respect to £ in an open neighborhood of £*, denoted by d^UxisKjt,) 
and d 2 UK{sK,£,), respectively. 
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(C6) sup^Var{^(hA^- 1 / 2 )C/ K /(hiV- 1 /2) - XkXk ,}->q as N^oo. 

(C7) supuhn^^E^^lhiV- 1 / 2 ) 2 < oo, sup K sup m2 < M E^lUKih x 
N~V 2 )\p < oo and sup K E l /P\U K {hN- 1 / 2 ) - U K (h'N'^ 2 )\P < c\\W - h|| 2 
for some integer p > qi + qs . 

(C8) snp 7eri || h || 2 < M | J2k^k> CK^i^fiK^hN^UKihN^l = o p (l). 

Proof of Theorem 5. We see that Xp^y.h.N' 1 / 2 ) can be written as 
the sum of EK^K'CK^dWKihN'^UK^hN- 1 / 2 ), 

2 £ CK^i^fiKihN-^U^ihN- 1 / 2 ) 

K^K' 

and Y1,k^k' c K,K'{l)^KOn-N~ l l 2 )iiK'()iN~ l l 2 ). In the following, we will prove 
that every term in the foregoing equation converges to zero in probability. 
For the third term, we have 

term (III) < sup /w[C( 7 )] sup Y,^(hN^ 2 ) 2 , 

7£r ||h|| 2 <M K 

which converges to zero as ./V is sufficiently large. The second term (II) is 
just assumption (C8). 

For the first term, we need to consider the process T/v( 7 , h) = term (I) — 
Xp(7,0). For each 7 and h, T/v(7,h) has mean zero and variance given by 

2 £ CK^^fVRriUKihN-^U^ihN-^-XKXK,}, 

which converges to zero by assumption (C6). To establish stochastic conti- 
nuity of T N (j,h), we find that T7v(7,h) - Tjv(Y,h') = (a) + (b) + (c) + (d), 
where each term on the right-hand side is given by 

(a) = £ [c K ,K>{i) ~ CK^d'WKihN-^pK^hN" 1 / 2 ) - x K ,}, 

(b) = J2 faMi) - C K ,K>(i)]xK>[U K (hN- l l 2 ) - X K ], 

(c) = £ CK^^UKih'N^pK^hN'^-UK^h'N- 1 ^ 2 )}, 

(d) = £ CK^^U^ihN-^pKih'N-^-UKihN- 1 / 2 )}. 

Using the same technique as in Lemma 1.3 of [24], we can finishes the proof 
by using Theorem 2.2.4 of [29] and assumption (C7). □ 



SCORE TEST IN MIXED MODELS 



25 



Acknowledgments. We thank the Editor, an Associate Editor and three 
anonymous referees for valuable suggestions which helped improve our pre- 
sentation greatly. We thank Dr. Kathleen Merikangas for making her al- 
coholism data set available to us. Reprints can be requested via e-mail: 
heping.zhang@yale.edu. 

REFERENCES 

[1] Andrews, D. W. K. (2001). Testing when a parameter is on the boundary of the 

maintained hypothesis. Econometrica 69 683-734. MR1828540 
[2] Bentler, P. M. and Dudgeon, P. (1996). Covariance structure analysis: Statistical 

practice, theory and directions. Ann. Review Psychology 47 563-592. 
[3] Breslow, N. E. and Clayton, D. G. (1993). Approximate inference in generalized 

linear mixed models. J. A mer. Statist. Assoc. 88 9-25. 
[4] Chen, H. and Chen, J. (2003). Tests for homogeneity in normal mixtures in the 

presence of a structural parameter. Statist. Sinica 13 351-365. MR1977730 
[5] Chen, H., Chen, J. and Kalbfleisch, J. D. (2001). A modified likelihood ratio test 

for homogeneity in finite mixture models. J. R. Stat. Soc. Ser. B Stat. Methodol. 

63 19-29. MR1811988 

[6] Chen, H., Chen, J. and Kalbfleisch, J. D. (2004). Testing for a finite mixture 
model with two components. J. R. Stat. Soc. Ser. B Stat. Methodol. 66 95-115. 
MR2035761 

[7] Chen, Z. and Dunson, D. B. (2003). Random effect selection in linear mixed models. 
Biometrics 59 762-769. MR2025100 

[8] Commenges, D. and Jacqmin-Gadda, H. (1997). Generalized score test of homo- 
geneity based on correlated random effects models. J. Roy. Statist. Soc. Ser. B 
59 157-171. MR1436561 

[9] Cox, D. R. (1983). Some remarks on overdispersion. Biometrika 70 269-274. 
MR0742997 

[10] Crainiceanu, C. M. and Ruppert, D. (2004). Likelihood ratio tests in linear mixed 
models with one variance component. J. R. Stat. Soc. Ser. B Stat. Methodol. 66 
165-185. MR2035765 

[11] Daniels, M. J. and Pourahmadi, M. (2002). Bayesian analysis of covariance 
matrices and dynamic models for longitudinal data. Biometrika 89 553-566. 
MR1929162 

[12] de Jong, P. (1987). A central limit theorem for generalized quadratic forms. Probab. 
Theory Related Fields 75 261-277. MR0885466 

[13] Delhing, LL, Mikosch, T. and Sorensen, M., eds. (2002). Empirical Process Tech- 
niques for Dependent Data. Birkhauser, Boston. MR1958776 

[14] Fan, J. (1996). Test of significance based on wavelet thresholding and Neyman's 
truncation. J. Amer. Statist. Assoc. 91 674-688. MR1395735 

[15] Guttorp, P. and Lockhart, R. A. (1988). On the asymptotic distribution 
of quadratic forms in uniform order statistics. Ann. Statist. 16 433-449. 
MR0924879 

[16] Hall, D. B. and PRjESTGAARD, J. T. (2001). Order-restricted score tests for homo- 
geneity in generalised linear and nonlinear mixed models. Biometrika 88 739- 
751. MR1859406 

[17] Jacqmin-Gadda, H. and Commenges, D. (1995). Tests of homogeneity for general- 
ized linear models. J. Amer. Statist. Assoc. 90 1237-1246. MR1379466 



26 



H. ZHU AND H. ZHANG 



[18] Jiang, J. (1996). REML estimation: Asymptotic behavior and related topics. Ann. 
Statist. 24 255-286. MR1389890 

[19] J0RGENSEN, B. (1987). Exponential dispersion models (with discussion). J. Roy. 
Statist. Soc. Ser. B 49 127-162. MR0905186 

[20] KOSOROK, M. R. (2003). Bootstraps of sums of independent but not identically dis- 
tributed stochastic processes. J. Multivariate Anal. 84 299-318. MR1965224 

[21] Liang, K. (1987). A locally most powerful test for homogeneity with many strata. 
Biometrika 74 259-264. MR0903126 

[22] Lin, X. (1997). Variance component testing in generalised linear models with random 
effects. Biometrika 84 309-326. MR1467049 

[23] Merikangas, K. R., Stevens, D. E., Fenton, B., Stolar, M., O'Malley, S., 
Woods, S. and RlSCH, N. (1998). Co-morbidity and familial aggregation of 
alcoholism and anxiety disorders. Psychological Medicine 28 773-788. 

[24] MlKOSCH, T. (1991). Functional limit theorems for random quadratic forms. Stochas- 
tic Process. Appl. 37 81-98. MR1091696 

[25] Pollard, D. (1990). Empirical Processes: Theory and Applications. IMS, Hayward, 
CA. MR1089429 

[26] Sen, P. K. and Silvapulle, M. J. (2002). An appraisal of some aspects of statistical 
inference under inequality constraints. J. Statist. Plann. Inference 107 3-43. 
MR1927753 

[27] Silvapulle, M. J. and Silvapulle, P. (1995). A score test against one-sided alter- 
natives. J. Amer. Statist. Assoc. 90 342-349. MR1325141 

[28] Tsai, M. (1992). On the power superiority of likelihood ratio tests for restricted 
alternatives. J. Multivariate Anal. 42 102-109. MR1177520 

[29] VAN der Vaart, A. W. and Wellner, J. A. (1996). Weak Convergence and Empir- 
ical Processes. With Applications to Statistics. Springer, New York. MR1385671 

[30] Vaida, F. and Xu, R. (2000). Proportional hazards model with random effects. Statis- 
tics in Medicine 19 3309-3324. 

[31] Varberg, D. E. (1968). Almost sure convergence of quadratic forms in independent 
random variables. Ann. Math. Statist. 39 1502-1506. MR0230359 

[32] Verbeke, G. and Molenberghs, G. (2003). The use of score tests for inference on 
variance components. Biometrics 59 254-262. MR1987392 

[33] Wei, B. (1998). Exponential Family Nonlinear Models. Springer, Berlin. 

[34] Zhang, H., Feng, R. and Zhu, H. (2003). A latent variable model of segregation 
analysis for ordinal traits. J. Amer. Statist. Assoc. 98 1023-1034. MR2041490 

[35] Zhang, H. and Merikangas, K. (2000). A frailty model of segregation analysis: 
Understanding the familial transmission of alcoholism. Biometrics 56 815-823. 

[36] Zheng, G. and Chen, Z. (2005). Comparison of maximum statistics for hypothe- 
sis testing when a nuisance parameter is present only under the alternative. 
Biometrics 61 254-258. MR2135868 

[37] Zhu, H. and Zhang, H. (2004). Hypothesis testing in mixture regression models. 
J. R. Stat. Soc. Ser. B Stat. Methodol. 66 3-16. MR2035755 

[38] Zhu, H. and Zhang, H. (2005). Generalized score test of homogeneity for mixed 
effects models: Supplement. Technical report, Yale Univ. School of Medicine. 
Available at peace.med.yale.edu. 



SCORE TEST IN MIXED MODELS 



27 



MRI Unit 

Department of Psychiatry 
Columbia University Medical Center 
and 

New York State Psychiatric Institute 

1051 Riverside Drive 

New York, New York 10032 

USA 

E-MAIL: hz2114@columbia.edu 



Department of Epidemiology 

and Public Health 
Yale University School of Medicine 
60 College Street 
New Haven, Connecticut 06520-8034 
USA 

AND 

Jiangxi Normal University 

Nanchang 

China 

E-mail: heping.zhangiQyale.edu 



